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5h ' Abstract 

We propose a new global optimization method {Simulated Tem- 
pering) for simulating effectively a system with a rough free energy 
landscape (i.e. many coexisting states) at finite non-zero tempera- 
ture. This method is related to simulated annealing, but here the 
temperature becomes a dynamic variable, and the system is always 
kept at equilibrium. We analyze the method on the Random Field 
Ising Model, and we find a dramatic improvement over conventional 
Metropolis and cluster methods. We analyze and discuss the condi- 
tions under which the method has optimal performances. 
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Simulated annealing is an efficient heuristic method which is used to find 
the absolute minimum of functions with many local minima: it has been 
introduced independently in the framework of the Monte Carlo approach for 
discrete variables in ref. [1], and in the framework of stochastic differential 
equations (of Langevin type) for continuous variables in ref. [2]. 

The essence of the method consists of the following. Let us suppose 
that we are interested in finding the minimum of a function H[X), where X 
denotes an element of the configuration space (which has dimension N, where 

is often a very high number). In most cases we do not know any method 
which can guarantee to find the minimum of H{X) with a computational 
effort that docs not increase more than polynomially in N. In these cases one 
can try as a first guess a random search starting from a random configuration 
and minimizing H{X) with a steepest descent algorithm. If the number of 
local minima increases as e^^ , with 7 different from zero, it often happens 
that this method also takes an exponentially large number of trials (i.e. e'^^, 
with in general 5 < 7). 

In the simulated annealing method one considers a /3-dependent algorithm 
which asymptotically generates the configurations X with Gibbs's probability 
distribution, i.e. e~^^^'^^; for definiteness we can consider the case of Monte 
Carlo steps. Simulations at increasing values of /? are done (eventually at 
(3 = 00). Each time f3 is changed the system is driven out of equilibrium, but 
that does not matter since eventually we are interested in the (3 = 00 result. 

In general the simulated annealing method does not have any reason to 
converge to the exact result, i.e. to provide the minimum of H(X). Only 
if we do an asymptotically large number of simulated annealing runs, or if 
the values of P are changed by an infinitesimal amount at each step and 
an infinite amount of Monte Carlo steps are done at each value of /S, the 
simulated annealing method will converge to the exact result and will find the 
minimum of H. But the convergence is guaranteed only if we asymptotically 
invest an infinite amount of computer time. If a reasonable annealing scaling 
is used (/5 is changed by a non-zero amount and only a finite number of Monte 
Carlo cycles are done at a given value of /3) we have no reason to believe that 
this procedure ends up in the global minimum; indeed in the extreme case 
in which /? takes only two values (0 and 00) we find the same result as the 
random search algorithm we have described before. 

The simulated annealing algorithm can however be used as an heuristic 
predictor for the global minimum: one can compare the values of the energy 
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after many simulated annealing runs and if the probability of ending with 
the global minimum is not too small, the simulated annealing turns out to 
be a rather efficient algorithm. Let us note that this efficiency depends a 
lot on the shape of the phase space: if the absolute minimum has a small 
basin of attraction, and is separated from the large local minima by very 
high barriers, simulated annealing does not have any reason to be a good 
algorithm. 

Unfortunately if we want to extend the algorithm to finite temperature 
we are very soon in deep trouble. Indeed if we stop our simulations at a given 
value of < oo, the one we want to use to evaluate observables, different runs 
will give different results (if /? is sufficiently large). In this case we cannot 
just select the runs which produce the configurations with lower energy: at 
T ^ Owe have to minimize the free energy F and not the energy. Estimating 
the entropic contribution is a non-trivial task, and makes a straightforward 
generalization of the simulated annealing impossible. This problem is very 
severe in cases like spin glasses [3] or hetero-polymers folding [4] (maybe 
also peptides [5]) in which there are more than one equilibrium state and 
we are actually interested in knowing the relative weight which the different 
equilibrium states carry in the partition function. 

The method we propose in this note is meant to bypass these difficulties, 
and to constitute a viable scheme to minimize free energy in an effective way. 
It can be regarded as a very efficient global optimization scheme. The basic 
idea of the Simulated Tempering method consists of changing the tempera- 
ture while remaining at equilibrium: this is in contrast with the simulated 
annealing method, where every change of the temperature drives the system 
out of equilibrium. This can be achieved by enlarging the configuration space 
of the system in the following way. 

We define a large configuration space, which is characterized by the vari- 
ables X (the original configuration space) and by a new variable m, which 
can takes M values (m = 1...M). The probability distribution P[X,m) 
will be chosen to be 



where we have absorbed the factor (3 in the definition of the Hamiltonian. 
We choose 




(1) 
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H{X,m)=P^H{X)-g^. (2) 

Here the 13m and the Qm can take arbitrary values we assign a priori. The 
Qm will be a priori assigned constants, and the will be dynamical variables 
which will be allowed to span a set of values given a priori. For simplicity 
we can assume that the 13m are ordered. 

It is evident that the probability distribution induced by the Hamiltonian 
2, restricted to the subspace at fixed m, is the usual Gibbs distribution for 
(3 — 13m- On the other hand the probability of having a given value of m is 
simply given by 

Pm oc Z,„e^'" = e-(*"-^-+f-) , (3) 

where the Zm arc the partition functions at given (3m (i-c. Zm = Z[j3m)) 
and the fm are the corresponding free energies. 
If we make the choice 

9m — Pmfm i (4) 

then all the Pm become equal. 

If our target is to do a simulation at a given value of we can take 
Pm = (3 and with the choice in eq. 4 we can perform a Monte Carlo simulation 
in which we also allow the change of m by 1 unit. In this case the system 
will be with a probability 4 at m = m. Only a fraction 4 of the events will 
be interesting for measuring directly expectation values at /3 (if the use of 
an histogram reconstruction makes also the other (3 values very useful). The 
frequent visits of the system to lower values of (3m will make it decorrelate 
much faster. Indeed at lower (3 values free energy barriers are lower, and the 
system will find it much easier to jump. Then, when it decides to cool off 
again, it will be visiting, with the correct equilibrium probability, a different 
minimum. 

This method may be useful only if the transition from one value of (3m to 
another happens with non-negligible probability. It is evident that if the two 
contiguous values of (3 are too different the probability of accepting a change 
will be rather small, and that, on the contrary, if they are too similar they 
will not help in decorrelating. 

Let us try to compute the probability for going from (3m to (3m+\ = (3m + S- 
If we try to modify P, the variation of the Hamiltonian is given by 
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A// = E 6 ~ {gm+i - g„i) 



(5) 



where E is the instantaneous value of the energy H[X). On the other 
hand we have that gm+i — gm is given by the value of the energy for some P 
in between and (5m+i- More precisely 



where E^^ is E^fim) {E{P) is the expectation value of H{X) as function 
of /?) and Cjn — If we assume that E is very close to Ej^ the variation 
AH will be not too large under the condition that 



One should also consider that there arc thermal fluctuations in the value 
of the energy which are of order of Cm- Condition 7 is equivalent to requiring 
that there is a non-negligible overlap in the values of the energy computed 
at contiguous values of (3^- 

In the usual thermodynamic limit the energy is a quantity of order and 
condition 7 requires that 6 is of order A^~2, which is not a very demanding 
condition. The main difficulty in the method is the required tuning in the 
choice of the gm- Indeed if one takes for the Pm an unreasonable value, the 
simulation could get trapped at a given value of Pm- In this respect it is 
interesting to note that we are not introducing any systematic bias. One 
can also think about the possibility of performing an iterative procedure in 
which the values of the g^'s are adjusted during the simulation, but we will 
see that already with the naive choice we are using one gets very impressive 
results. 

We have applied the Simulated Tempering method to the Random Field 
Ising Model (RFIM), which has many features that are very relevant to our 
case. It has a rough landscape, and the symmetry of the + and the — state 
of the pure Ising model is broken by the random magnetic field. This is 
not a trivial symmetry any more, and the flips from the + to the — sector 
(and back) is an essential part of the dynamics. The state oriented in the + 
direction and the one oriented in the — direction, which macroscopically are 
very similar, from a microscopic point of view are completely different. The 




(6) 



CmS' = 0(1) . 



(7) 
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transition from the favoured state (which is selected by the specific reahzation 
of the magnetic field) to the suppressed one is a rare event. 

For the RFIM an extension of the cluster update method'^' ^1 does not 
give any improvement over the local classical Metropolis methodl^l. The 
system undergoes the usual pathology of freezing already at T > and the 
spins form a large cluster. In no way does the cluster method help in this 
case, for example, to tunnel from a + to a — state. 

We have implemented the Simulated Tempering by proposing one (3 up- 
date at the end of each sweep of the lattice spins. The computational time 
required to compute the /? update is negligible. 

Let us anticipate our results: as we will show in some detail the Simulated 
Tempering method helps a lot. In our test, correlation times for observable 
quantities which are not sensitive to the magnetization decrease by a factor 
of 6 as compared to the Metropolis and the cluster method. As far as the 
estimate of the magnetization is concerned, the method changes the picture 
dramatically, allowing tunneling where the Metropolis method is trapped in 
a single state, and correcting, in some cases, wrong estimates given by the 
Metropolis method. 

The lattice Hamiltonian is 

<i,j> i 

where sites i live on a 3 dimensional lattice of volume V, the sum runs 
over all first neighbors of the lattice, the spins can take values ±1, and 
the site random fields hi take values 

h^ = \h\e,, (9) 

where the 6i take the values ±1 with probability |. 

We have taken in our simulations V = 10^ and \h\ = 1. We have worked 
with a given realization of the random magnetic field. In order to character- 
ize the system in fig. 1 we show the specific heat, and in fig. 2 the magnetic 
susceptibility (as defined, on the finite lattice, from the fluctuations of 
The 3 points with errors are from 3 runs done by using the cluster algorithm, 
while the dotted, dashed and dot-dashed lines are done by using the recon- 
struction method proposed in ref. [9]. The continuous line uses the method 
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of ref. [9] by patching the 3 data points (for details see also ref. [10]). The 
reconstruction is very reliable. 

We have analyzed the measured observables by means of a binning proce- 
dure, obtaining an asymptotic estimate for the errors. We have also focused 

our analysis on the study of r*"*, which is the relevant quantity related to the 
true error over measured observables. Following ref. [11] we use an improved 
estimator for r'™*, taking in account the remainder R: 

1 w-i 

ro\W) = -+ Po{t) + Rom , (10) 



where 



Po{W-l) 

We have taken W up to 20. 

The errors on Tint are, when we quote an asymptotic estimate for them, 
always of the order of 1 on the last digit. We have also monitored that Texp 
gives consistent results (we do not quote it here since it is always noisier than 

Tint)- 

In table 1 we give two of the measured observables: the thermal part of 
the energy (i.e. the expectation value of crjUj), £'t, and the magnetization 
m. Et has a behavior typical of the quantities that are Z2 symmetric. The 
lines called (MC) and (CL) give information about the runs we have done 
with the classical Metropolis method and with the cluster algorithm. These 
runs have been used (together with more MC runs at other (3 values) to get 
a preliminary estimate of the system energy and to determine the values of 
the gm- It is in no way necessary to get, for estimating the gm, more than 
a rough estimate of the Ej^, and in a practical application of the method 
the preliminary MC runs can be very short. It is possible to determine 
directly the values of the e~-^" , by using the energy histograms taken in the 
preliminary runs. Although we stress that this possibility exists, we do not 
think that it could dramatically increase the efficiency of the method. When, 
in table 1, we put errors and Ti„_t in square brackets we mean that we did not 
get an asymptotic estimate. Let us also note now that the MC run at (3 = .26 
gets a wrong expectation value for m. In this case the standard Metropolis 
does not produce any tunneling event, and always stays in the — phase. 
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1.5279(8) 


2.4 


-.320(12) 


105 


301 


.25 (F) 


1.5281(8) 


3.3 
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9 
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Table 1: Thermal energy, magnetization and related integrated autocorre- 
lation times. Errors are in round brackets (). When in square brackets, [], 
error and r*"* estimates are not asymptotic. The value for m given by the 
Metropolis method (MC) at /3 = .26 is wrong. 
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Run 


Pi, 111 


02,112 


P3,n3 


/?4,n4 




B 


.24, 145 


.25, 297 


.26, 158 






C 


.23, 206 


.25, 226 


.27, 167 






D 


.245, 148 


.25, 301 


.255, 151 






E 


.24, 149 


.245, 300 


.25, 301 


.255, 301 


.26, 150 


F 


.23, 159 


.24, 290 


.25, 290 


.26, 306 


.27, 155 



Table 2: /? values allowed in each of our Simulated Tempering runs, and 
number of iterations (in units of 10^) the system spent at each (3 value. For 
historical reasons we label the runs with the capital letters B, C, D, E, F. 

In table 2 we give details about our Simulated Tempering runs. We have 
tried different combinations, allowing the system to take 3 or 5 /3 values, 
always centred around (3 — .25. In table 2 we check the performance of our 
method at the different P values we have allowed in the different simulations. 
The choice of the /5 values has been dictated, as we have discussed before, by 
the requirement of having a non-negligible overlap in the energy histograms 
of the preliminary MC runs. Runs D and E have a very small S value, and 
a high acceptance factor for a /3 update, of ~ 70%. Runs B and F have a 
medium 5, and a /3 acceptance factor of 40 — 50%. Run C has a higher 5 
value and a very low acceptance factor for the f3 update, 10 — 15%. 

In fig. 3 we give f3m as a function of the computer time for system B, where 
3 P values were allowed and the /5 acceptance value is close to 50%. Let us 
start by commenting on the results for m, which are quite spectacular. At 
P — .24 (not so low T) Tm is higher than O(IOO) for Metropolis and Cluster 
methods, and gets down to 32 in the F run. In general runs with a larger 
S value seem to be more effective for improving the estimate of m. Things 
are better and better at lower temperatures. At /3 = .25 from Tm > 700 we 
go down to Tm — 52 in run F, with a gain of a factor larger than 12. At 
P — .255 from Tm > 6000 we go down to 108 in run E, with a gain of a 
factor better than 60. At P = .26 after 200000 steps the Metropohs does not 
succeed in getting a single tunneling event, while our run E has Tm = 52. In 
figs. 4a — c we show what happens. In fig. 4a we give the magnetization as a 
function of computer time for the Metropolis method, for 200000 steps. The 
system stays in the — state, with very large fluctuations which never succeed 
in getting a complete flip. In fig. 46 we plot m for our F system, only 1000 
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steps. Here the data points are at different P values, and it is clear that going 
to different /3 values allows an easy ffipping. In order to malce the situation 
clear in fig. 4c we have selected only the first 10000 configurations, of the F 
dynamics, which happen to be at /? = .26. The picture speaks for itself. 

Also for Ex there is a large gain at all 13 values. One gains a factor 3 at 
(3 = .24, .25, a factor 6 at /3 = .255, and a factor 2.5 at P = .26. In this case 
the best performances are obtained for small 6 values. 

It is a pleasure for us to thank Masataka Fukugita for interesting discus- 
sions, and Paul Coddington for a critical reading of the manuscript. Hard- 
ware and software computer support has been given by Infn (Roma Tor 
Vergata) and NPAC (Syracuse). 
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Figure Captions. 



1 Specific heat Cy. Points from cluster algorithm data, lines from hijstogram 

reconstruction. 

2 As in fig. 1, but susceptibility x- 

3 /3 as a function of the computer time for the runs of series B {Z (3 values 

allowed, 50% acceptance ratio). 

4a — c Magnetization m as a function of computer time. In (a) for the 
Metropolis method at /9 = .26, in (h) m, for the F systems {j3 is here a 
dynamical variable which is allowed to take 5 values during the course 
of the dynamics), in (c) the configurations of run F which have (3 — .26. 
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